Skip to content

Require compound terms for typed literal objects#151

Merged
justinjoy merged 2 commits into
mainfrom
typed-literal-compound-terms
Jun 27, 2026
Merged

Require compound terms for typed literal objects#151
justinjoy merged 2 commits into
mainfrom
typed-literal-compound-terms

Conversation

@justinjoy

@justinjoy justinjoy commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Two prompt-hardening changes to skills/factlog/references/text-to-fact.md
(the authoritative extraction criteria), plus a related stale-doc fix in the
factlog init template. All convert a soft "may" into a "must, when X", or
correct an out-of-date capability note.

1. Exhaustive extraction (완전성 원칙)

Dense tables — rosters, financial/registry status, budget line items,
schedules, career/patent records — are the highest-density fact source, yet the
prior criteria only said "record relation candidates." In practice the extractor
skimmed prose and dropped repeated table rows: a real proposal with ~400
extractable facts yielded ~90 (≈20–25% coverage).

  • forbid sampling of repeated items ("대표 몇 개만" → extract all N)
  • table → triple mapping rule (row key→subject, header→relation, cell→object)
  • judge coverage by section/table sweep, not converted-file byte size
  • pre-finish self-check, PII exclusions preserved

2. Typed-literal compound terms (재량 아님)

Date/amount/ordinal/number objects left as prose strings ("2017.03.08",
"126백만원") can't be sorted/thresholded by the engine. Left to discretion the
extractor never emits compound terms (observed: 0 across a full sync).

  • require date()/ordinal()/amount()/number() for typed literals, with a
    prose→term mapping table
  • engine-support note (corrected): date/ordinal/number/amount all
    project to comparable int64. number is fixed-point scaled ×1000 (3
    decimals), so a hand-authored threshold uses scaled integers (V >= 2000,
    not 2.0; an unscaled float fails loud). number AND amount are
    positive-only, so negative-capable values (e.g. an operating loss) cannot
    be made comparable and stay plain strings. amount also needs a unit table.
  • cross-references attribute-relations.md / typed-relations.md

3. Stale #125 note in the init template

#125 (number-type comparison) is closed/implemented, but
factlog/cli.py's typed-relations.md template still said number was "not yet
engine-projectable", which seeded incorrect guidance into every new KB. Updated
to the fixed-point ×1000 reality. (An earlier revision of this PR's
text-to-fact wording repeated the same stale claim; also corrected here.)


Docs/criteria/template only — no engine code paths touched. The reference file
is read at extraction time, so changes are live without reinstall. Verified:
scaffolded typed-relations.md still parses to {} with no warning;
test_typed_literals.sh (9), test_vocab.sh (18) pass; a number KB answers a
scaled version >= 2.0 comparison and rejects an unscaled float threshold.

Promote the typed-literal guidance from discretionary ("may, if clearer")
to directive: dates, amounts, ordinals, and plain numbers MUST be written
as compact compound terms (date()/ordinal()/amount()/number()) instead of
prose strings. Left to discretion the extractor never emits them, so the
engine cannot sort/threshold/range over values that are really comparable.

- mapping table prose -> compound term per type
- honest engine-support note: date/ordinal fully project; amount needs a
  unit table and is positive-int only (use number() for negatives);
  number projection still pending (#125) but emit the term for structure
- cross-reference attribute-relations.md / typed-relations.md so declared
  relations actually project and compare
…ble)

#125 (number-type comparison) is CLOSED: `number` projects to a fixed-point
int64 scaled ×1000 (literal_types.parse_number_scaled), so date/ordinal/number/
amount all compare. The init template (factlog/cli.py) and this PR's earlier
text-to-fact wording still said number was "not yet engine-projectable", which
is wrong and was the source of incorrect guidance.

Fixes both:
- factlog/cli.py typed-relations.md template: number now documented as
  fixed-point ×1000 int64, positive-only, thresholds in scaled units.
- text-to-fact.md: number is projectable; thresholds use scaled integers
  (`V >= 2000`, not `2.0`); number AND amount reject negatives (verified:
  parse_number_scaled('-672') -> None), so negative-capable values (e.g. an
  operating loss) cannot be made comparable and stay plain strings.

Verified: a number KB answers `version >= 2.0` (scaled `V >= 2000`) and an
unscaled float threshold fails loud via _assert_no_unscaled_number_threshold.
Scaffolded typed-relations.md still parses to {} with no warning; typed_literals
(9) and vocab (18) tests pass.
@justinjoy justinjoy merged commit ab539bf into main Jun 27, 2026
3 checks passed
@justinjoy justinjoy deleted the typed-literal-compound-terms branch June 27, 2026 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant